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Abstract 

This study proposes a robust estimator for stochastic frontier models by integrating the idea of Basn 
et al. [1998, Biometrika 85, 549-559] into such models. We verify that the suggested estimator is strongly 
consistent and asymptotic normal under regnlarity conditions and investigate robust properties. We use 
a simulation stndy to demonstrate that the estimator has strong robust properties with little loss in 
asymptotic efficiency relative to the maximum likelihood estimator. A real data analysis is performed 
for illustrating the use of the estimator. 
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1 Introduction 

Technical efficiency (TE) measures have been used for several decades for benchmarking purposes. The 
concept of TE was first introduced by Farrell (1957). Since then, two strands of TE measurement developed 
in the late 1970s and early 1980s: data envelopment analysis (DEA), based on linear programming, and 
stochastic frontier analysis (SEA), which commonly uses parametric stochastic frontier (SF) models. 
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The DEA technique is mainly used to measure TE scores in the research fields of managerial and economics 
studies. Since DEA often requires only input and output quantities, it is quite easy to understand the 
technique’s empirical results and to apply these results to any empirical investigations. However, a weakness 
of DEA is that it is sensitive to extreme values, making it difficult to apply the technique to data sets with 
outliers. Several attempts have been made to solve this problem. Eor example, Wilson (1993, 1995) suggested 
a method for detecting outliers and Cazals et al. (2002) proposed a robust estimator for the nonparametric 
frontier model. Simar (2003) employed the method of Cazals et al. (2002) to detect outliers using classic 
DEA estimators. Florens and Simar (2005) also proposed robust parametric estimators of nonparametric 
frontiers. 

The SEA framework is a counterpart to the DEA in that it is a parametric approach. This means that 
the functional form, such as production or cost functions, needs to be assumed before estimating the TE 
score. One of the pioneering methodologies in the SEA framework was developed by Jondrow et al. (1982), 
who proposed a formula for separating a random error component and a TE component. Owing to the ease 
of application, various models have been developed and SE models have been widely employed in efficiency 
measurement studies. For example, the approach suggested by Battese and Coelli (1995) provides the TE 
and the determinants of the TE. Numerous statistical methods have been proposed for estimating SF models. 
For example. Park and Simar (1994) and Park et al. (1998) considered semiparametric estimation in SF 
panel models and Kumbhakar et al. (2007) introduced an approach for nonparametric SF models. Kopp 
and Mullahy (1990) and Van den Broeck et al. (1994) applied the generalized method of moments procedure 
and Bayesian method, respectively, to parametric SF models. Kneip et al. (2015) proposed an alternative 
and new approach for nonparametric SF models using penalized likelihood. 

This study addresses the estimation of parametric SF models, particularly in the presence of high- or 
low-performing observations. In empirical data analyses, one often faces observations with a comparative 
advantage, such as highly advanced technology, which yield a super efficiency score. These observations 
should be treated carefully because they can influence the estimation procedure in the same way as outliers 
do. As is widely recognized in the literature, the maximum likelihood (ML) estimation method is influenced 
strongly by outliers or extreme values. Our simulation shows that applying the ML estimator to the SF 
model suffers from the same problem, requiring the development of a robust estimation method for SF 
models. However, to the best of our knowledge, little effort has been made in this regard. 

The purpose of this study is to propose a robust estimator for SF models. To construct a robust 
estimator, we consider the estimation method based on divergence, which evaluates the discrepancy between 
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any two probability distributions. The divergence-based estimation method has been used successfully in 
constructing robust estimators in the past. For a review, refer to Pardo (2006) and Cichocki and Amari 
(2010), as well as the references therein. In this study, we employ density power divergence, as proposed 
by Basu et al. (1998) (henceforth, BHHJ). BHHJ proposed a minimum density power divergence (MDPD) 
estimator, and demonstrated that it possesses, relative to the ML estimator, strong robust properties with 
little loss in asymptotic efficiency. Compared with other robust methods, such as the minimum Bellinger 
distance estimation, the BHHJ method does not require any smoothing methods. Hence, it avoids the 
difficulty of selecting a bandwidth when estimating the nonparametric density estimation. For this reason, 
the BHHJ method can be applied conventionally to any parametric models to which the ML estimation can 
be applied. For example, see Juarez and Schucany (2004), Fujisawa and Eguchi (2006), and Kim and Lee 
(2013). 

The remainder of the paper is organized as follows. Section 2 reviews the BHHJ estimation method and 
proposes a robust estimator for SF models based on density power divergence. This section also examines 
the asymptotic and robust properties of the proposed estimator. In Section 3, we discuss our simulation 
study that compares the performance of the conventional ML estimator and the MDPD estimator in the 
SFA framework. In Section 4, we analyze real data that contain some low-performing observations using 
both estimators, again for comparative purposes. Lastly, Section 5 concludes the paper. 

2 Robust estimation in the stochastic frontier models 

This section reviews the MDPD estimator and integrates it into the SFA framework in order to estimate the 
TE. 

2.1 Minimum density power divergence estimator 

In this subsection, we review the BHHJ estimation procedure that minimizes a density-based divergence 
measure. 

Let / and g be probability densities. To measure the difference between / and g, BHHJ defined the 
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density power divergence, da{f,g), as follows: 


da{g,f) 


I |r+“W-(l + ^)5WrW + ^5'+“W}dz ,a>0, 
J g{z) {log g{z)-log f{z)}dz ,a = 0. 


( 1 ) 


Note that the divergence includes Kullback-Leibler divergence and L 2 -distance as special cases. Since da{f, g) 
converges to do{f,g) as a —>■ 0, the above divergence with 0 < a < 1 provides a smooth bridge between the 
Kullback-Leibler divergence and the L 2 -distance. 

Consider a family of parametric distributions {Fg : 9 G Q C K™} possessing densities {fg} with respect 
to the Lebesgue measure, and let Q be the class of all distributions having densities with respect to the 
Lebesgue measure. For a distribution G G Q with density g, the MDPD functional at G (i.e., Ta{G)) with 
respect to {Fg : d e 0} is defined by 


Ta{G) = argmin da{g, fg), 
eee 


( 2 ) 


where it is assumed that Ta{G) exists and is unique, as will normally be the case. Note that when G belongs 
to {Fg} (i.e., G = Fgi for some 9' G 0), Ta{G) becomes 9'. Roughly speaking, F^^^q^ can be considered as 
a projection of G onto the space of {Fg : 0 S 0} in terms of the divergence, and Ta{G) becomes the target 
parameter of the MDPD estimator below. 

Given a random sample Xi, ■ ■ ■ , with unknown density g, the MDPD estimator for the parameter 
Ta{G) is defined as an empirical version of (§. That is, 


where 


1 - ' 

9a,n = argmin - 'V' 9), 


H^{Xg9) 


J - (l + ^) , « > 0, 

- log fg{x^) ,a = 0. 


(3) 


BHHJ showed that 9a^n is weakly consistent with Ta{G) and asymptotically normal, and demonstrated that 
the estimator has strong robust properties. The robust property of the estimator can be understood by 
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checking the following estimating equation: 


1 

n 




2 = 1 


(1 + a) y U 0 {z)f^+^{z)dz - i ^ U 0 {Xi)f^{X,) = 0, 


where Ug{x) = ^ log /e(x). Comparing the estimating equation of the ML estimator (i.e., UeiXi) = 0), 

one can see that the MDPD estimator provides density power weight, fg{Xi), to each Ug{Xi), whereas the 
ML estimator gives the equal weight. This means that the robustness of the MDPD estimator is obtained by 
providing a down-weight to the outliers. Indeed, a controls the trade-off between robustness and asymptotic 
efficiency in the estimation procedure. In the literature that applies the BHHJ procedure to other statistical 
or econometric models, the MDPD estimators show good robustness against outliers, while still having a 
high efficiency relative to the ML estimator, especially when the true distribution belongs to {Fg} and a is 
close to 0. For example, Juarez and Schucany (2004) and Fujisawa and Eguchi (2006) applied the procedure 
to the generalized Pareto distribution and the normal mixture distribution, respectively. Lee and Song 
(2009, 2013) introduced the MDPD estimator for the GARCH and diffusion models, respectively, and Kim 
and Lee (2013) employed the estimation method for the copula parameter in the SCOMDY models. Since 
the estimator with a > I causes a significant loss of efficiency, estimations with a € [0,1] are commonly 
employed. 

This approach can be easily extended to estimations in regression models. Let {fgiy\x)} be a family of 
regression models with a parameter 0 € 0, and let g{y\x) be the true density for Y, given X = x. Then, a 
family of the x-conditional version of the density power divergence is defined as 


da{g{-\x), fg{-\x)) = < 


J |/e’^“(y|a:) “ (^1 + 9 {y\x)fg{y\x) + E^’^“(2/|a;)| dy , a > 0 

g{y\x) {\ogg{y\x) - log fg{y\x)} dy 


, a = 0 . 


Given observations {{Xi,Yi)}f^i, the above divergence makes it possible to employ the MDPD estimators 
for regression models, as follows: 


0a,n = argmin- 'Y' Ha{X^, Fj; 9) 


See n 


(4) 
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where 




J f^+^{y\X,)dy - (l + ^) fnY,\X^) ,a>0 

-logfeiYi\Xi) 


a = 0. 


As an alternative to the ML estimation, we apply this estimator to the SF models, as described in the next 
subsection. 


2.2 The MDPD estimator for SF models 

Consider a random sample {{Xi, Yi)}f^i with Xi G and Yi G K, satisfying the following stochastic frontier 
model: 


Y, = g(X,,/?) + K - Ui, z = l,---,n, (5) 

where g{x,j5) is the frontier production function with parameter /3 G Vi and Ui are the random error 
term and technical inefficiency, respectively; and {Vi} and {Ui} are assumed to be independent. 

Denoting the true density functions of V and U by fv and fu, respectively, the true conditional density 
of Y, given X = x, is obtained by 

pOO 

fiy\x)= fu{u)fv{u + y - g{x,l3))du. 

Jo 

Since it is not usually easy to specify the distributions of V and C/, we consider a class of pseudo(or quasi) 
distributions having parametric densities to construct the MDPD estimator. In this case, the SF model 
under consideration is misspecified if the true distribution of V and U do not belong to the given family. 
Let fg{y\x) be the conditional density induced from the pseudo parametric distributions. Then, the MDPD 
estimator can be defined by inserting the pseudo conditional density fe{y\x) in the estimator given in Q 
and the pseudo parameter to be estimated is given by 

:= argminE[d„(/(-|A:),/e(-|X))], 
see 

where 0 denotes the parameter space. Note that if V and U are correctly specified, i.e., /(yjx) = fgg(ylx) 
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for some 0o G 0, it holds that 0* = 0o for a > 0. 

In this paper, we consider the normal distribution and the truncated-normal(or the exponential) dis¬ 
tribution as the pseudo distributions for V and U, respectively. That is, our MDPD estimator for ([^ is 
constructed using the pseudo conditional densities below regardless of whether the true densities fv and fu 
belong to the assumed class or not. 

• When A^(0, cr^) and iV+(^, cr^) are employed as the pseudo distributions for V and U, respectively, the 
pseudo conditional density is given by 


1 r 


fs(yjx) = - 

a 




'y - 3 (x,/ 3) + y 


$ 


/j^ _ y-g{x,p) ^\ 

V a\ cr J ’ 


( 6 ) 


where $(•) and <))(•) are the standard normal cumulative distribution and density functions, respectively; 

= ay + Uy and A = au/ay] and 9 denotes {/3, fi, tT„, ay). Note that setting /r = 0, Q reduces to the 
following conditional density: 

/.(*) = (7) 


which is the conditional density of the normal - half normal SF model. 

• When N{0,al) and Exp{l/au) are considered for the pseudo distributions of V and U, respectively, 
we have 


fe{y\x) = —«>( - 


y- 9ix,l3) 


:)exp( 


y-g{x,j3) al \ 
<yu 2al)' 


( 8 ) 


where 6 denotes (/3, m, cr„, cr„). 

In the case of a = 0, the above estimator becomes the quasi ML (QML) estimator. Hereafter, we denote by 
[NT](resp. [NE]) the case in which ([^(resp. Q) is adopted as the pseudo conditional density. Further, we 
assume that infgge(tT„ A ay) > 0. 


Remark 1. To the best of our knowledge, the integral of fl^°‘{y\x) in Q with ([^ or ([^ cannot be expressed 
by a closed form. This makes it problematic to obtain the explicit form of the above objective function. 
In our simulation study, we use the numerical integration method provided in R-metrics to implement the 
MDPD estimator, which seems to produce sufficiently good approximation results to estimate the parameters 
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(see Section 3). 


2.3 Asymptotic properties of the MDPD estimator 

This subsection derives the asymptotic properties of the MDPD estimator for (§. We particularly con¬ 
centrate on the estimator with a > 0. The following regularity conditions are required to establish the 
consistency. 

Al. The parameter space 0 is compact and the pseudo parameter 0* G 0. 

A2. {Xi} is a set of p-dimensional i.i.d. random vectors with density fx and that are independent of {Ui} 
and {Vi}. 

A3. g{x,l3) is continuous in (3 for all x G 

A4. supgge fe{y\x) < C for some C, where C does not depend on x and y. 

Theorem 2.1. Let {(A^, he a random sample from and suppose that assumptions A1-A3 hold. 

If pseudo conditional density f 0 {y\x) satisfy A4, then, for each a > 0, the MDPD estimator 0a,n defined by 
0 with the pseudo conditional density fe{y\x) converges almost surely to 0*. 

Remark 2. In the case of [NT], by the compactness of 0, we can take some constants b,b,u,u,a and a 
such that 0 C [b,b]‘^ x [u, u] x [a,a]‘^, where 0 < q; < if < oo. In what follows, without loss of generality, we 
assume 0 = [b,b]'^ x [u,?Z] x under the case of [NT]. Similarly, when the case [NE] is considered, 0 is 

assumed to be [6,5]'^ x [ct, ct]^. 


Assumptions A1-A3 are general conditions in practice, so it suffices to check whether assumption A4 holds 
or not to ensure the consistency of the MDPD estimator. In the cases of [NT] and [NE], one can readily get 
global upper bounds for the pseudo conditional densities. That is, when [NT] is considered, we have 


fe{y\x) < 


1 If .^/ max(lMl,lul) 

a I V a 


-1 


m- 


When [NE] is considered, we can obtain a following upper bound: 



(7 


sup4>f- 
- 2 >o \ uJ 

8 


f8{y\x) < 







Using the fact that $( 0 ;) < e ^ for all a: < 0, we can see that the RHS of the above inequality is finite. 

In order to obtain the asymptotic normality, we impose additional assumptions. Through out this paper, 
da and 9^^ denote ^ and respectively, and the symbol |j • || denotes the h norm for matrices and vectors. 

A5. 0* lies in the interior of 0. 

A 6 . Ka, :=E[dsH^iX,Y;0l) deTH^{X,Y;9*J] < 00 . 

A7. Esup\\dggTHa{X,Y;9)\\ < 00 . 
eee 

A 8 . Ja := E[dggTHa{X,Y; 9^)] is positive definite. 

Then, we have the second asymptotic result of the MDPD estimator. 

Theorem 2.2. Assume that assumptions A1-A8 hold. Then, for each a > 0, 

Vn{L.n-e*a,) A A( 0 , J-iA„J-i). 

Remark 3. In the case of f{y\x) = fgg{y\x) for some 9o e 0, we have 

J„ = (l + a)E[fg^-^(YlX)dgfgg(YlX)dgTfgg(YlX)], 

Ka, = {l + afE[f^;^-^iY\X)dgMY\X)dgTfeg{Y\X)]-E[fe], 

where ^ = J f^^{y\X)defeoiy\X)dy. 

For a > 0, assumptions A 6 and A7 can be ensured by more simple conditions in the cases of [NT] and [NEj. 
Indeed, the following proposition provides a sufficient condition for A 6 and A7. 

Proposition 2.1. Assume that 0 is compact and g{x,/3) is twice differentiable w.r.t. /3 for all x. Under 
the cases of [NT] and [NE], z/E[supggQ ||9,gg(A,/?)9^Tg(A,/3)||] < 00 andE[supgg 0 ||9^^Tg(A,/3)||] < 00 , 
then A 6 and A7 hold for a > 0. 

Remark 4. In the case where g{x,[f) is a linear function of x, i.e., g{x) = 0^x, one can see that 
E[supgg 0 || 9 , 95 (A,/ 3 ) 9 ^Tg(A,/?)]]] = E||AA^|| and Esupgg 0 ||( 9 |^t 5 (A,/3)|| = 0. Hence, the conditions 
in the proposition reduce to E||AA^|| < 00 . This condition is not a serious restriction in empirical analysis. 
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because it is usual to regard the input variables as limited resources which implies that the input vector X 
can be assumed to be finite. In other cases, E|jXX^|| < oo together with the compactness of 0 and the 
continuity of dpg and can be a sufficient condition for A6 and A7. 

Proofs for the results in this subsection are provided in Appendix. 

2.4 The influence function of the MDPD estimator 

In this subsection, we discuss the influence function of the MDPD estimator to describe the effect of in¬ 
finitesimal contamination. Letting F be the true distribution of {X,Y), the functional T{F) corresponding 
to the MDPD estimator can be defined as 

T(P') := argmin / Ha{x,y;9)dF. 

9e& JRp+i 


Note that since 



Ha{x,y]9)dF 


E[dMi-\X),M-\X))] - ^E[r(Y\X)] , a > 0 

^ E[d„(/(-|A),/,(.|A))] -E[log/(r|A)] ,a = 0 


(9) 


and da{f{-\X), fg{-\X)) has a minimum value at 0* almost surely, T{F) becomes 0*. For e € [0,1], denote 
by Fg the contaminated distribution of the form: 


Fe = (1 - e)F -h €6{xo,yo), 

where S{xo,yo) has all its mass at the point {xo,yo)- Then, the functional T{Ff^) satisfies the following 
equation: 


(1-e) [ d6Ha,{x,y,T{F,))dF + ed6Ha,{xo,yo;T{F^)) = 0 . 

JRP+I 
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Hence, taking the derivative of the LHS of the above equation w.r.t. e and putting e = 0, the influence 
function of T at is obtained as 


IFa{xo,yo-,T, F) 


-{ [ d^gs^H^{x,y,TiF))dF} ^ deH^{xo,yyT{F)) 

deH^{xo,yyei) . 


Using (23), (25), (29) and Lemma 6.3 in Appendix, we have the following result. 


Proposition 2.2. Assume that 0 is compact and g(x,/d) is differentiable w.r.t. /3 for all x. Under the cases 
of [NT] and [NE], we have that for a > 0 and 9 G Q, 


deHa{x,y;9)\\ < C{l + \\dpg{x,l3)\\), 


where C is a constant free from xq, yo o,nd /3. 

The proposition states that the influence function of the MDPD estimator with a > 0 using ([^ or ([^ 
is bounded in j/o regardless of the form of g{x, (3) and the boundness in a;o is determined by the boundness 
of djsgix, j5[fj. Hence, examining dpg{x,l3), one can see whether the influence function of the estimator is 
bounded or not. For instance, if the input vector X is assumed to be finite as mentioned in Remark 4, the 
continuity of dpg{x,[3[f) yields 


sup \\IFa{x,y]T,F)\\ < oo, 

(x,y)GRP+i 


which means that the MDPD estimator with a > 0 has a finite gross error sensitivity. The case of g{x, /3) = 
P'^x satisfies the condition. 

On the other hand, the influence function of the QML estimator is unbounded in y^. To see this, note 
that d0Ho{x,y;9) = —dgf0{y\x)/fg{y\x). Using the notations in 0 and ( |20| ), we have that under the case 
of [NT], 

\\d0Ho{x,y-9)\\ = \Di^p\ \\di3g{x, l3)\\ + + |Di,„|, 


and under the case of [NE], 


d0Ho{x,y]e)\\ = \D2,fi\ lli9/35(a;,/3)|| + \D2,v\ + \D2,u\- 
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One can readily check that each of the above two equations contains unbounded terms. For instance, 
and £> 2 , 1 , include Ai and respectively, which are obviously unbounded in j/o- Thus, we have 

sup \\IFo{x,y;T,F)\\ = CO. 

{x.y)eRP+^ 

Therefore, we can conclude that the MDPD estimator with a > 0 has a robust property while the QML 
estimator does not. 


2.5 The choice of optimal a 

Choosing an optimal a is an important issue in empirical studies. Taking a rather conservative approach, a 
small a is recommended because too a large a may result in a signihcant loss in efficiency when the portion 
of outliers is not very large, as speculated. Several studies on the problem are found in the literature. 
Warwick and Jones (2005) proposed a selection rule for a that minimizes the asymptotic estimation of 
the mean squared error. Fujisawa and Eguchi (2006) proposed an adaptive method based on an empirical 
approximation of the Cramer-von Mises divergence. Durio and Isaia (2011) considered a data-driven method 
based on the similarity measure between the MDPD estimate and the ML estimate. 

In our real data analysis, we employ the procedure of Durio and Isaia (2011) to select an optimal a. 
More specifically, suppose that a sample is observed from a regression model Y = mii{X) -|- e, 

where X = {Xi, • • • , Xp) and the variance of e is Then, let Tq and Ti be two regression estimators for /?. 
Now, we wish to choose one of the two estimators. To do so, Durio and Isaia (2011) proposed the following 
normalized index to measure the similarity between two estimates, say Ptq and /3 ti ■ Letting 


P = [minxii,maxxii] x ••• x [minx^p,maxa;^^], 

C = P X [min7/i,maxj/j], 

D = {{x,y) : < y < maxime P}r\C, 


the similarity index is defined by 


sim{To,Ti) 


Id 

Ic 


If two estimates /Jjb and are close, then stm(To,Ti) will be close to zero. In order to investigate 

whether Ptq and Pti are close, they used the simplified Monte Carlo significance (MCS) test based on the 


12 



above statistics. That is, after generating m — 1 bootstrap samples of size n, sim*{To,Ti) is calculated 
for each bootstrap sample to obtain a critical value. Here, bootstrap sample {{Y*, Xi)}f^i is sampled from 
Y* = (xi) + ii, where q is generated from a specified distribution with mean zero and variance If 
sim(To, Ti) is less than the maximum value of sim*{TQ, Ti), we accept the null hypothesis (i?o) of /3 = /3 t(, 
at a significance level of 1/m, and conclude that Pto and are close. This test can be used to check for 
outliers. For example, if Tq is the ML estimator and Ti is a robust estimator, accepting Hq means that 
no outlier is detected and, therefore, we select the ML estimate owing to its efficiency. Based on this, the 
procedure for selecting a is as follows: 

1. In order to check for the existence of outliers, conduct the simplified MCS test with the ML estimator 
{To) and the MDPD estimator with a = a* (Ti), for some 0 < a* < 1. 

2. If the MCS test leads us to accept Hq, then we decide that outliers are absent and, thus, the ML 
estimate is selected. 

3. If not, we again perform the MCS test with the MDPD estimators with a = a (Tq) and a = a* (Ti), 
increasing a until the first time we can accept iLg. 

3 Simulation study 

In this section, we evaluate the finite-sample performance of the MDPD estimator with a > 0 and compare 
it with the ML estimator. For this task, we consider the following model: 


Y = /3o+/^iX + V-U, (10) 

where X ~ [7(0,1), V ~ N{0,al) and U ~ cr^). The true parameter vector (/3o,/3i, cr^, is 

considered to be (5,5,0.75,1). We generate 1,000 samples of size n = 500 and, for each sample, the ML 
estimates and the MDPD estimates with a G {0.05,0.1,0.2,0.3,0.5,0.75,1} are obtained. Based on 1,000 
repetitions, the mean, standard deviation (SD), and the sample mean squared error (MSE) of each estimate 
are calculated. In order to assess the performance, the following figure is considered: 

j I(Po - , //3i - /3i {2 


(^) +(^) 
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We also estimate the individual TE using the estimator proposed by Battese and Coelli (1988), which is 
based on the ML estimate and the MDPD estimates. Then, we calculate the MSE of the estimated TEs. 
That is, 

1 ” 

MSEyrE} ■.= -J2{T%-TE,y, 

i—1 

where TEi is the true TE given by and TEi is obtained by 


TE 


I 


<i)(/r*/CT* - cr*) r 

$(/Lt*/cr*) I 


+ 



( 11 ) 


where /it* = —{Yi — $o — + ^u) = ^v^uli^v + ^u)- Now, we compare the performance 

based on the means of d and the MSE^te]- 


[Table 1 about here.] 


First, we deal with the case where the observations are not contaminated by outliers. The estimation 
results are reported in Table where the figures marked by the symbol * denote the minimal MSE, d, and 
MSE^ee]- It can be seen that the MDPD estimators with a = 0.05 and 0.1 slightly outperform the ML 
estimator, and the MDPD estimator with a = 0.2 performs similarly to the ML estimator. This is interesting 
because we had anticipated that the ML estimator would perform best. Nonetheless, we could expect that 
the ML estimator would show the best performance as the sample size increases. The point is that the 
performance of the MDPD estimator with a close to 0 is similar to the ML estimator, and the efficiency of 
the MDPD estimator decreases with an increase in a. The results in Table confirm this finding. 

Next, we examine the case in which outliers are involved in the observations. For this, we generate 
two types of contaminated samples. The first considers upward outliers and is generated as follows: i) 


generate the uncontaminated sample {(W,I^i)}r=i from the model (10), and outliers {{X° by Y° = 
jdo +/3iX° +p„cr«, where X° ~ i.i.d. C/(0,1); ii) replace n° observations in {{X^,Yi)}f^^ by {{X°,Y°)}'l^-^ 
. In the second type of contamination, n° observations in the uncontaminated sample are replaced by 
{{X°, Y°)}'l^-y, where X° ~ i.i.d. 17(0,1) and Y° ~ i.i.d. 17(0.5,1), to create downward outliers. Hence, the 
first sample describes a situation in which some companies or individuals achieve a relatively high efficiency, 
whereas the second considers low efficiency cases. For the simulation, n° = 3 and = 5 are considered. 


[Table 2 about here.] 
[Table 3 about here.] 
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[Figure 1 about here.] 


Tables and present the estimation results for the upward and downward contamination cases, re¬ 
spectively. The box plots of the ML and MDPD estimates are displayed in Figure Here, the upper and 
lower panels show the upward and downward outlier cases, respectively. In each box plot, the horizontal red 
line represents the true parameter values. We first note that all the MDPD estimators under consideration 
produce a smaller mean for d than that produced by the ML estimator. In particular, the estimator with 
a between 0.2 and 0.5 yields quite a small mean of d relative to the mean of the ML estimator. This in¬ 
dicates that the MDPD estimator performs better than the ML estimator does. As shown in Table and 
the upper panel of Figure the ML estimator yields severe underestimates of /3o and and overestimate 
of tr^, whereas the MDPD estimator with a > 0.2 estimate the parameters properly. Here, it is important 
to note that the underestimation of cr^ leads to an overestimate of the TE values. On the other hand, the 
case of the downward outlier contamination shows different results. As can be seen in Tableland the lower 
panel of Figure cr^ and /?o are overestimated and is underestimated by the ML estimator. In both 
contamination cases, /3i does not seem to be affected by the outliers. Although not shown here, as more 
data are contaminated by outliers (i.e., as n° or increases), the MDPD estimator performs increasingly 
better than the ML estimator does. From these simulation results, we confirm that the MDPD estimator 
possesses much more robust properties than the ML estimator does. 

4 Real data analysis 

This section provides the empirical data analysis, consisting of two subsections. The first subsection describes 
the data set used in the empirical study. The second subsection provides the QML and MDPD estimation 
results, including the procedures for checking outliers and selecting an optimal a. Based on the results, we 
then calculate and compare the estimated TEs. 

4.1 Data 

We investigate the distribution of TE scores for Korean manufacturing firms. To do so, we use firm-level 
financial statement data taken from the Korea Information Services (KIS-VALUE) in 2007. To measure 
the TE scores, we collect data on value-added {Y, output), capital stock {K, input), and labor (L, input). 
Fixed assets are used as a proxy for capital stock, comprising the sum of five components such as land, 
building, construction, vehicles, and machine tools. The number of employees is used for the labor variable. 
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Observations with negative V have been removed from the original data. Then, the number of firms in our 
final data set is 2,031. 


[Table 4 about here.] 

[Figure 2 about here.] 

Table|^provides summary statistics, including means, medians, and standard deviations. For all variables, 
the mean value is much larger than the median value and the skewness of the value-added, capital, and labor 
variables are calculated to be 14.11, 12.27, and 12.98, respectively. This indicates that the distributions of 
all variables are severely skewed to the right. Clearly, our data set has some firms that operate with large 
amounts of inputs and outputs, and some firms operating with very small amounts are also included. In 
particular, note that a few firms are observed to produce comparatively small output to average production, 
as depicted in Figurewhich displays the scatter plot of the pairs of log(F/L) and log(K/L). In this study, 
we emphasize that these low- or high-performing firms could be influential observations, acting like outliers. 
As demonstrated in our simulation study, these are highly likely to have an undesirable effect on the ML 
estimation, which also affects the TE estimate. Hence, in the next subsection, we estimate the SF model 
using the QML and MDPD estimation methods. We also fit the SF model to the data set in which very low¬ 
er high-performing firms are removed and compare the results. 


4.2 Estimation results 

In order to investigate the distribution of the technical efficiency scores, we employ the Cobb-Douglas 
production function assuming constant returns-to-scale. Then, logs of value-added per employee and capital 
stock per employee (i.e., log(y/L) and \og{K/L), respectively) are considered as augmented output and 
input variables in the regression model. The production function form with random error Vi and technical 
inefficiency Ui is given by 

\og{Y,/Li) = /3o + /3i \og{K,/Li) + - 17,. (12) 


In this analysis, we consider the normal and the half normal distributions as the pseudo distributions for 
V and U, respectively, as in usually done in most empirical studies. That is, V ~ N{0,al) and U ~ 
77+(0,(t 2) are assumed and thus the pseudo conditional distribution for (12) is given by ([^. The parameter 
9 = where cr^ = cr^ -I- tr^ 7 = cr„/cr„, is estimated using the QML estimator and the 

MDPD estimator with a between 0.05 and 1. However, we only report the results corresponding to a in 
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{0.05,0.1,0.2,0.3,0.4,0.5} because the MDPD estimator with a greater than 0.5 produces estimates of 7 
close to the boundary. 

[Table 5 about here.] 

Table presents the QML and the MDPD estimation results. The figures in parentheses denote the 
standard errors. There are significant differences between the QML estimates and the MDPD estimates. 
The estimates of /3o, and 7 show a decreasing trend as a increases, which is similar to the simulation 
result in which downward outliers exist, as shown in Table However, the estimates of fi\ vary to some 
extent according to the estimators. It is important to note that the QML estimator produces a relatively large 
estimate of The scatter plot of observations and the estimated frontier lines are displayed in Figure 
The dashed and solid lines represent the frontier production function estimated by the QML estimate and 
the MDPD estimate with a = 0.3, respectively. As shown in the figure, the fact that the dashed line lies over 
the solid line, along with the estimation results in Table presumably indicates that the data set contains 
observations acting like downward outliers. 

[Table 6 about here.] 


For this reason, we first investigate whether outliers exist. To this end, we conduct the MCS test procedure 


introduced in subsection 2.5 at a significance level of 1%, that is the case of m = 99. A bootstrap sample, 
{((y/L)*,A:i/Li)}}L;i, is generated from (y/L)* =/3 o.To+/3i,To log(A:i/Li) + where F ~ 

and U ~ A^~''(0, Tq)- First, we compare the QML estimator (Tq) and the MDPD estimator with a = 0.5 
(Ti). In this case, the similarity index, sim(TQ,Ti), and the maximum value of sim*{T q,Ti) are calculated 
to be 0.046 and 0.015, respectively. Since sim{TQ,Ti) is larger than the maximum of s*m*(To, Ti), we reject 
the null hypothesis oi 9 = Otq, signifying that outliers do exist in the data. Next, we repeat the MCS test to 
select an optimal a. The test results are summarized in Table[^and show that the optimal value of the tuning 
parameter corresponds to a = 0.3. We therefore conclude that the optimal estimate of the Cobb-Douglas 
production model should be log(y/L) = 6.929 + 0.382 log(Ar/L) with = 0.124 and = 0.126, which 
corresponds to the MDPD estimate with a = 0.3, and, thus, TEs should be calculated using the MDPD 
estimate. 


[Figure 3 about here.] 

Accordingly, we calculate the TEs based on the MDPD estimate with a = 0.3 using the Battese and Coelli 
(1988) estimator. Denote by TEml and TEmd the TEs calculated using the QML and MDPD estimates. 
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respectively. For comparison, we also compute the TEml (see Figure]^. Here, the left panel depicts the 
estimated densities of TEml (black solid line) and TEMD,a=o .3 (red dashed line), and the right panel displays 
the scatter plot of pairs (TEML>,ct=o. 3 , TEml)- Note that the QML estimate yields comparatively lower TE 
scores than does the MDPD estimate, mainly owing to the large estimate of cr^. This result implies that if 
we were to rely only on the QML estimate, most of the firms would be measured as performing worse than 
they did in reality. 

[Figure 4 about here.] 

[Table 7 about here.] 

[Table 8 about here.] 

[Figure 5 about here.] 


Finally, in order to illustrate the behaviors of the QML estimator and the MDPD estimator in the 


absence of very low- or high-performing hrms, we additionally estimate the model (12) based on the data 
set in which such observations are removed. To get the cleaned data, we run an OLS regression with 
\og{Yi/Li) = + fix \og{Ki/Li) a, and then just eliminate in the original data the firms of which absolute 

value of the studentized residual is larger than 3. The cleaned data and the estimated frontier lines are 
depicted in Figure]^ in which we can see that a few high- and several low-performing firms are removed. 

The estimation results are reported in Table Compared with the hgures in Table it can be seen 
that differences between the QML estimate and the MDPD estimates become comparatively small. This is 
consistent with the results of the MCS tests shown in Table where the test results indicate that all the 
estimates under consideration are close and outliers are absent. The behavior of the MDPD estimates with 
small a is observed to be similar to that of the QML estimates. Based on these results, the model with the 
QML estimate would be optimal if one can validate that V and U follow the normal and the half-normal 
distributions, respectively. For the moment, we do not, however, assert the QML estimate as the best one 
because it is not easy to check out the distributional assumptions. In the present case, we emphasize that 
the choice of a is not crucial because the QML estimate and the MDPD estimates show similar results in 
such cases, not making a significant difference between TEml and TEmd- As can be seen in Figurethe 
QML estimate and the MDPD estimate with a = 0.3 yield similar TEs comparing with those in Figure 

In summary, our data analysis strongly suggest that the MDPD estimator can be a promising estimator 
for the SFA framework in the the presence of very low- or high- performing firms. As mentioned earlier, the 
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choice of an optimal a is an important issue particularly when outliers are suspected in the data. While we 
introduced the procedure of Durio and Isaia (2011) as the selection rule, the implementation of the procedure 
could be computationally burdensome, especially when considering many explanatory variables. For other 
statistical models, as mentioned in subsection |2.1[ existing studies have found that the MDPD estimator 
with a small a is robust enough against outliers, while maintaining efficiency, when there are no outliers. 
Thus, based on previous studies and results of our simulation and empirical studies, we recommend values 
of a in [0.1, 0.4] in situations in which selecting an optimal a is difficult. 

5 Conclusion 

This study has proposed a robust estimation method for stochastic frontier models. Our robust estimator 
is constructed by minimizing the empirical version of the density power divergence introduced by Basu et 
al. (1998). In particular, the conditional density of the normal-truncated normal(or exponential) SF model 
is used in constructing the MDPD estimator regardless of the distributions of V and U, and its asymptotic 
and robust properties are investigated. The selection rule of an optimal a is also introduced, adapting the 
procedure of Durio and Isaia (2011). Our simulation results indicate that the ML estimator is severely 
compromised by outliers. In contrast, the MDPD estimator with a small a shows strong robustness against 
outliers, with little loss in asymptotic efficiency relative to the ML estimator. Therefore, the proposed 
MDPD estimation method can be used when outliers are suspected to contaminate data. We also apply 
the estimation method to a real data set having very low- or high-performing observations to illustrate the 
behaviors of the QML and the MDPD estimators. Our empirical study suggests that the estimator could be 
suitable for the case in which a few observations perform uniquely well or poorly, as often occurs in empirical 
studies. 

Although we focus on a cross-sectional model, the estimation method can be extended to general SF 
models including panel models. We leave this extension as possible areas of future research. 

6 Appendix 

In this appendix, we provide proofs for the theorems and propositions stated in subsections |2.3| and |2.4| 

Proof of Theorem 12.11 
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First, note that by assumption A2, 


1 

- ^ E[H^{X,Y;e)] 

n 


JJ Ha{x, y; 0)f{y\x)fx{x)dydx 
E[dMi-\X)Je{-\X))] - JJ ^f+^{y\x)fxix)dydx 


and E[Ha{X,Y-6)] has a minimum at 0*. In order to show the consistency of the MDPD estimator, it is 
therefore necessary to derive the strong uniform convergence of the objective function. That is. 


sup 

6 »ee 


1 

n 


E[H^{X,Y-6)\ 


0 , 


(13) 


which in turn implies that 


da,n = argmin - 'V Ha{Xi, Y^; 9) 
see 


argminE [Ha{X, Y\9)\. 
see 


While there are several sets of conditions to guarantee (131, we employ the following regularity conditions: 
(i) 0 is compact; (m) Ha{x,y\9) is continuous in 0, for all x,y] and [Hi) Ha{X,Y;6) is dominated by an 
integrable random variable that is free from 9 (see, for example, chapter 16 in Ferguson, 1996). Here, it is 
readily to see that [ii) holds by the continuity of g[x,j3). Also, in view of assumption A4, we have 


\Ha[X,Y-9)\< J C^fe[y\X)dy+(l + J^C^ < (2+i)c“, 


which establishes the theorem. 


□ 


Hereafter, we denote Hi[9) := Ha[Xi,Yi;9) and fe := fe{y\x) for notational convenience. Further, we 
shall use the relation A < B, where A and B are nonnegative, to denote that A < CB for some constant 
C > 0. For example, A < 1 means that A is bounded by some constant C. 

Lemma 6.1. Suppose that assumption A7 holds. If 9a,n converges almost surely to 9*a, then 

n 

-Y.dl,^H,[9a,n) ^ E[dl,^H,[9*a)]. (14) 

^ • 1 

Proof. Since E[dggTHilda)] is finite by assumption A7, following the argument similar to that used in 
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Lemma A2 in Ling and McAleer (2010), for any e > 0, we can take a ije > 0 such that 


lim P max sup 


1 


1^00 \n>l 0(zVo{7],) nW 




{d^00^H,ie) -E[d^g,^H,{ei)]} 


i=l 


>6 = 0 , 


( 15 ) 


where Vb(? 7 e) = {9 : ||0 — 0*11 < ??c}. In addition, since converges almost surely to 0*, we also have that 
for any e > 0, 

lim P rmax||0a_„ - 6»*|| > e) = 0. 

1^00 \ n>l J 


Using this and (15), we have 


P max 


1 


n>l n\\ 

i—1 


n 

E - E[a2,.iL,(0:)] } 

i=l 


> e 


< P 


("max II 0a,„-0^11 +P (^max||0a,„ - 0*11 < r]„max^\'^ {djgTH,{9a,n) - 'E[dggTH,{e*J]} 

Y n>l J \ n>l n>l Tl M ^ 


< P ("max|10a,„-0*11 > rye) +P (^max sup -|| ^ {^^^^^*(0) - E[9^gTP*(0*)] } 


^ 0 , 


> e 


which asserts (14). 


□ 


Proof of Theorem 12.21 

Note that E[i9eiL(0*)] =E[dgda{f{-\X), f 0 ^{-\X))] =0. Since E[9eiL(0*)i9gTiL(0*)] is finite by assumption 
A6 and {dgHi^d^)} is a sequence of i.i.d. random vectors, it follows from the multivariate central limit 
theorem that 


1 


— Y^d 0 H,{ei) ^ N{0,K^). 


(16) 


Applying Taylor’s expansion to d 0 Hi{ 9 ), we have 

1 -A „ „ , 1 


0 = ^ ^ aeP.(0a.„) = ^ 5] dgH,{e*^) + - ^ dl,TH,{,h,u)y/n{L,n - 0o), 

\/ 77 / \/ TX TX 

V 2—1 V 2 — 1 2 = 1 


where 0 q,,„ lies between 9a,n and 0*. Therefore, Theorem 2.2 is asserted from Lemma 6.1 and (16). □ 


We now present derivatives of the pseudo conditional densities stated in subsection 2.2 and some lemmas 
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to verify Proposition 2.1 


• Derivatives of ^ 
Letting 


), Ai := 
V tT„/ 


y - g{x,P) + fj. >^, (aw 

- and A 2 := —- {y-g(x,P)), 

G GA G 


by simple calculation, we can show that 


fe = ^</’(Ai)$(A 2 ), 


dpfe = /e{^Ai + ~ '= feDi,pdi3g{x, (3), 

"■■A = *{ - ^> + j (- ^' + xsis)} '= 

a../« = /»{ - ^ + 

d.Je = fe{-^ + + A ?4 + := /.A.., 

I cr^ ^ CT„ cr^ cr^ $(A2) J 


(17) 


where 


5a„A2 

5a„A2 


-(— - t4) + -(y - 9ix,l3))(^ + —) := hy{y,ay,au) - (4 + —)A 2 , 

cr Vcr„ A o"^ / (T \a^ ay/ \a‘^ a^J 


_, J_ 


+ —) + -{y - 9{x,l3)){^ - —] := hu{y,,ay,au) - (4 “ —)A2. 

G.,/ G \G^ G„J \G-^ G.J 


(18) 

(19) 


• Derivatives of ([^ 

Denote { := 


—. Then, we can express that 


fe 


— $(^)exp [ - —^ 


2al 


)■ 


dpfe = fe < 
daje = fe 
daje = fe 


— - — \dpg{x,P) := feD2,i3dpgix,P), 



Gy 


^ _f_ 


2)m 

<P(0 

<i>(C 


- 4 j ■.= foB2, 
I := feD2,u- 




( 20 ) 
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Lemma 6.2. For all k > 0, I > 0, m > 0 and n> 0, we have 


sup{$'=(x)|xre—^}<1. 


Proof. Using the facts that <i>(x) < e 2 “^ for x < 0 and = 0(—x) as x —>■ — 00 , we have 


x^R 


□ 

Lemma 6.3. Assume that 0 is compact and g{x,j3) is two times differentiable w.r.t. 13 for all x. Under the 
cases of [NT] and [NE], we have that for a > 0, 


fe iy\x)\\9efeiy\x)\\ < 1 + sup || 9 ^ 35 ( 0 ;,/ 

eee 


( 21 ) 


and 


fe iy\x)\\9ee'rfe(y\x)\\ < 1 + sup \\d 0 g(x, /3)dpTg(x, P)\\ + sup ||5^^Tg(x, 

See see 


( 22 ) 


Proof. We only consider the case of [NT] because the result for the case of [NE] can be deduced by substituting 
Di. with D 2 .. and following essentially the same arguments below. 

Due to the compactness of 0, is bounded away from zero and A is bounded above (see, Remark 


2). Thus, using Lemma 6.2 we have 


feWdpfeW = 


1 


;r(Al)$“(A2) 


< 

< 

< 


fja+lA° 

|Ai|</>“(Ai)lia;35(x, 


Ai + A 


</>(A 2 ) 


$(A 2 ) 


\\9pg(x,/3)\\ 




{ sup |s|,f"(s) + sup *“(s) MI ,J)|| 


sup \\disg(x,l3)\\. 
see 


(23) 

(24) 


Since hy and /i„ given in (181 and (191, respectively, are continuous and 0 is compact, hy and are bounded 
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below and above. Thus, it is readily shown that 


< |Ai| + |Aip + (1 + IA 2 I) 


$(A 2 )’ 


Using this and Lemma |6.2[ we can show that 


fe ^{\dtJ.fe\ + \daje\ + \da^fe\} < 1 +sup(|z| + + sup$“(z)(l + < 1, 


(25) 


which together with (24) implies (21). 


We next derive an upper bound of the second derivatives. Since 0 is compact and 




we have 


\Di,p\ + iDl^l < |Ai| + |Ai|2 + 


and 


lia«c,»ll < (1 + + AM ) ||a,,(,,«||. 


Hence, using these and a similar method as for (24), we have that 


fs \\df30Tfe\\ < fe\\D^^pdfig{x,l3)dpTg{x,f3) +dpg{x,/3)disTDi^fi + Di^fidppTgix,l3)\\ 


< sup Wdpgix, P)dpTg{x, I3)\\ + supWdloTgix, 
9ee eee 


(26) 


Furthermore, noting that 


9fj,Di^p 

dcr^Di^p 


1/1 A 

aH ^$(A2) $2(A2)i’ 

= (a, + A^) +1 (-Aa, - + A (-a,| 1M _ a. A, 


(7 


= Ai + A 


'$(A 2 )J 

(()(A 2 )\ 1 f cr. 


$(A 2 ) ) tJ ( CT^ 


- -^Ai + 


a2ci>(A2) ' '' V “"‘&(A2) <1>2(A2); 

1 (j){A2) ^ A(\^\ A^( 

(7y $(A2) 


■ A ("-A A 1 

+ ^ 1 , "$(A 2 ) $ 2 (A 2 )J^"'‘ V 
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we can show 


frHH,M\ + Hajo\\ + H^je\\} 

< + + + d,^Di,0\}\\dpg{x, I3)\\ 

< sup |l(9;35(a:,/3)|| (27) 

eee 

By a simple calculation, it is straightforward to show that dg^Di g^,, where Oj, 9^ € {g, a^, (T„}, is dominated 
by a polynomial function of Ai, A 2 , and A 2 ^[^ 2 ) ' Thus, in a similar fashion to the above, we can 

verify that 


fg ^{y\x)\dlgjg{y\x)\ < 1. 


(28) 


Combining (26|-(28), we establish (22|. 


□ 


Proof of Proposition |2.1| 

Note that 


/ \\f^{y\X)dgfg{y\X)\\dy < [ fr\y\X)\\dg fg{y\X)\\fe{y\X)dy < sup fr\y\X)\\dgfgiy\X)\\. 
J J y,e 


Then, we have 


\\d0H{e)\\ 


(1 + a) 


f^{y\X)dgfg{y\X)dy - fr\Y\X)dgfg{Y\X) 


< Bnpfr\y\X)\\dgfg{y\X)\ 
v,e 


(29) 


and thus, by (21), 


||9,77(0)5^.77(0)11 = ||5e77(0)f < 1 + sup ||5^5(V/?)^^(V/3)ll- (30) 

9 


Next, it follows from (211 that 


sup/e“ ^{y\X)\\dgfg{y\X)deTfg{y\X)\\ = ( sup ^{y\X)\\dgfg{y\X)\\) < 1+sup ||5^5(X,/3)5,gT6f(A,/3)||. 

y,e y,9 ^ e 
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Using this and (22), we have 


Y^H0tH{9)\\ < I {a/“-"(y|X)||a,/«(y|X)5,./,(y|X)||+/“-i(y|X)||5,V/e(y|^)||}/«(j/|^)d2/ 

+ |a - l\fr"{Y\X)\\defe{Y\X)deTfe{Y\X)\\ + fr\Y\X)\\dl,^fe{Y\X)\\ 


< sup/“-2(2/|X)||a,/,(2/|X)a,T/,(y|X)||+sup/“-'(2/|^)li9,V/fl(j/|X)|| 

y,e y,s 

< l+sn-p\\dfig{X,l3)dpTg{X,P)\\+s\yp\\d1pTg{X,P)\\. (31) 

9 9 


Hence, the proposition follows from (30) and (31). 


□ 
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1: The box plots for the upward (upper panel) and the downward (lower panel) contamination cases. 
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Figure 2: The plot of \og{Y/L) against \og{K/L) and the estimated frontier lines. 
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Figure 3: Density estimates of TEs (L) 


and the scatter plot 


(R) of (TEMD.a=l 


.3: TEml)- 
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Figure 4: The plot of the cleaned data with the estimated frontier lines. 
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Figure 5: Density estimates of TEs (L) and the scatter plot (R) of (TEm_d,q=o. 3 ) TE^l) after removing 
very low- or high-performing firms. 
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Table 1: Mean (SD/MSE) of the estimates, mean of d, and MSE^te] when no outliers exist. 



00 

01 

a" 

7 

<^1 

<yl 

d 

Af S E^xe] 

MLE 

4.950 

5.001 

1.730 

1.122 

0.761 

0.969 

0.404 

0.061 



(0.257/0.069) 

(0.171/0.029) 

(0.311/0.097) 

(0.428/0.184) 

(0.158/0.025) 

(0.445/0.199) 

[1.000] 



0.05 

4.956 

5.001 

1.733 

1.130 

0.760 

0.973 

0.399 

0.058* 



(0.245/0.062)* 

(0.171/0.029)* 

(0.307/0.094) 

(0.412/0.170)* 

(0.156/0.024)* 

(0.438/0.193) 

[0.986] 


M 

0.10 

4.954 

5.000 

1.732 

1.127 

0.762 

0.970 

0.397 

0.059 



(0.248/0.064) 

(0.171/0.029) 

(0.307/0.094)* 

(0.413/0.171) 

(0.156/0.025) 

(0.438/0.192)* 

[0.983]* 


D 

0.20 

4.952 

5.001 

1.733 

1.127 

0.762 

0.971 

0.415 

0.060 



(0.253/0.066) 

(0.174/0.030) 

(0.317/0.101) 

(0.427/0.183) 

(0.161/0.026) 

(0.453/0.206) 

[1.026] 


P 

0.30 

4.952 

5.000 

1.738 

1.132 

0.760 

0.978 

0.430 

0.061 



(0.263/0.071) 

(0.176/0.031) 

(0.330/0.109) 

(0.445/0.198) 

(0.166/0.028) 

(0.469/0.220) 

[1.065] 


D 

0.50 

4.945 

4.999 

1.743 

1.137 

0.758 

0.985 

0.481 

0.066 



(0.287/0.085) 

(0.184/0.034) 

(0.370/0.137) 

(0.503/0.253) 

(0.183/0.034) 

(0.524/0.274) 

[1.191] 


E 

0.75 

4.927 

4.998 

1.741 

1.131 

0.759 

0.983 

0.542 

0.074 



(0.323/0.109) 

(0.194/0.038) 

(0.412/0.170) 

(0.580/0.337) 

(0.204/0.042) 

(0.584/0.341) 

[1.341] 



1.00 

4.916 

4.999 

1.750 

1.146 

0.755 

0.995 

0.606 

0.080 



(0.344/0.125) 

(0.207/0.043) 

(0.453/0.205) 

(0.658/0.433) 

(0.224/0.050) 

(0.646/0.416) 

[1.499] 



Notes: The values in square brackets show the ratios of the mean of d to that of the ML estimate 
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Table 2: Mean (SD/MSE) of the estimates, mean of d and MSE\te] when upward outliers exist: n° = 3,p„ = 5. 



00 

01 

a" 

7 

<^1 

<yl 

d 

Af S E^xe] 

MLE 

4.243 

5.004 

1.267 

0.012 

1.258 

0.009 

1.213 

0.285 



(0.125/0.589) 

(0.177/0.031) 

(0.091/0.242) 

(0.09/1.313) 

(0.069/0.262) 

(0.08/0.988) 

[1.000] 



0.05 

4.382 

5.008 

1.297 

0.196 

1.160 

0.137 

1.044 

0.219 



(0.269/0.454) 

(0.173/0.030) 

(0.210/0.249) 

(0.324/1.023) 

(0.126/0.184) 

(0.279/0.823) 

[0.860] 


M 

0.10 

4.548 

5.009 

1.384 

0.436 

1.044 

0.340 

0.814 

0.165 



(0.348/0.326) 

(0.175/0.031) 

(0.279/0.212) 

(0.458/0.727) 

(0.164/0.113) 

(0.409/0.602) 

[0.671] 


D 

0.20 

4.830 

5.012 

1.608 

0.898 

0.853 

0.755 

0.528 

0.088 



(0.330/0.138) 

(0.173/0.030)* 

(0.340/0.136) 

(0.513/0.329) 

(0.190/0.047) 

(0.506/0.316) 

[0.435] 


P 

0.30 

4.915 

5.009 

1.698 

1.065 

0.786 

0.912 

0.468 

0.069 



(0.292/0.093) 

(0.176/0.031) 

(0.344/0.121)* 

(0.487/0.245)* 

(0.180/0.034)* 

(0.500/0.257)* 

[0.386]* 


D 

0.50 

4.940 

5.011 

1.733 

1.124 

0.763 

0.970 

0.487 

0.066* 



(0.290/0.088)* 

(0.185/0.034) 

(0.368/0.136) 

(0.504/0.255) 

(0.184/0.034) 

(0.525/0.277) 

[0.401] 


E 

0.75 

4.923 

5.011 

1.738 

1.127 

0.762 

0.976 

0.553 

0.074 



(0.323/0.110) 

(0.194/0.038) 

(0.414/0.172) 

(0.583/0.341) 

(0.205/0.042) 

(0.592/0.350) 

[0.456] 



1.00 

4.915 

5.010 

1.750 

1.143 

0.758 

0.992 

0.611 

0.078 



(0.344/0.126) 

(0.205/0.042) 

(0.454/0.206) 

(0.652/0.425) 

(0.223/0.050) 

(0.648/0.419) 

[0.504] 
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Table 3: Mean (SD/MSE) of the estimates, mean of d and MSE^te] when downward outliers exist: n° = 3. 



00 

01 

a" 

7 

<^1 

<yl 

d 

A7 S E^^te] 

MLE 

5.303 

4.983 

2.617 

1.955 

0.554 

2.063 

1.102 

0.056 



(0.133/0.110) 

(0.177/0.032) 

(0.260/0.819) 

(0.301/0.732) 

(0.095/0.048) 

(0.317/1.230) 

[1.000] 



0.05 

5.223 

4.996 

2.354 

1.739 

0.596 

1.758 

0.796 

0.052 



(0.131/0.067) 

(0.173/0.030) 

(0.238/0.421) 

(0.279/0.419) 

(0.099/0.034) 

(0.300/0.664) 

[0.723] 


M 

0.10 

5.151 

5.003 

2.149 

1.558 

0.638 

1.511 

0.571 

0.049* 



(0.143/0.043)* 

(0.172/0.029)* 

(0.259/0.226) 

(0.295/0.249) 

(0.110/0.025) 

(0.333/0.372) 

[0.518] 


D 

0.20 

5.047 

5.005 

1.916 

1.328 

0.700 

1.216 

0.423 

0.052 



(0.204/0.044) 

(0.172/0.030) 

(0.312/0.125) 

(0.377/0.172)* 

(0.137/0.021)* 

(0.417/0.220) 

[0.384] 


P 

0.30 

4.998 

5.003 

1.823 

1.230 

0.725 

1.098 

0.419 

0.057 



(0.238/0.057) 

(0.174/0.030) 

(0.341/0.121)* 

(0.424/0.185) 

(0.148/0.023) 

(0.454/0.216)* 

[0.380]* 


D 

0.50 

4.953 

5.000 

1.752 

1.157 

0.735 

1.017 

0.465 

0.067 



(0.286/0.084) 

(0.182/0.033) 

(0.405/0.164) 

(0.497/0.247) 

(0.171/0.029) 

(0.513/0.264) 

[0.422] 


E 

0.75 

4.922 

4.997 

1.695 

1.132 

0.701 

0.994 

0.546 

0.077 



(0.332/0.116) 

(0.201/0.040) 

(0.539/0.294) 

(0.582/0.339) 

(0.218/0.050) 

(0.584/0.341) 

[0.496] 



1.00 

4.904 

4.983 

1.643 

1.129 

0.648 

0.995 

0.649 

0.088 



(0.374/0.149) 

(0.272/0.074) 

(0.687/0.482) 

(0.672/0.452) 

(0.294/0.096) 

(0.657/0.432) 

[0.590] 
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Table 4: Descriptive statistics of variables used in the empirical study {n = 2,031) 



Mean 

Median 

S.D. 

Max 

Min 

Y (Value-added, Thous. KRW) 

19,290.4 

6,303.2 

75,198.1 

1,756,980.8 

45.0 

K (Capital stock, Thous. KRW) 

48,202.8 

11,686.4 

222,752.3 

3,944,656.7 

28.1 

L (Number of employees) 

203.6 

97.0 

507.6 

11,156.0 

3.0 


39 





Table 5: Estimation results of Cobb-Douglas production function 




/3o 

Pi 


7 



QMLE 

7.450(0.125) 

0.354(0.010) 

0.570(0.014) 

1.692(0.075) 

0.148 

0.423 

a 

= 0.05 

7.303(0.114) 

0.363(0.010) 

0.463(0.011) 

1.508(0.065) 

0.141 

0.322 

a 

= 0.10 

7.179(0.109) 

0.369(0.009) 

0.384(0.010) 

1.345(0.065) 

0.137 

0.247 

a 

= 0.20 

7.022(0.107) 

0.378(0.009) 

0.298(0.011) 

1.151(0.079) 

0.128 

0.170 

a 

= 0.30 

6.929(0.110) 

0.382(0.009) 

0.250(0.012) 

1.010(0.096) 

0.124 

0.126 

a 

= 0.40 

6.858(0.115) 

0.384(0.009) 

0.213(0.014) 

0.840(0.126) 

0.125 

0.088 

a 

= 0.50 

6.773(0.133) 

0.386(0.010) 

0.175(0.019) 

0.552(0.233) 

0.134 

0.041 


Notes: the figures in parentheses denote standard errors. 
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Table 6: The MCS test results for selecting the optimal a 


To 

sim{To,Ti) 

max(stm*(To, Tij) 

Ho 

QMLE 

0.04588 

0.01533 

Rej. 

a = 0.05 

0.03808 

0.02058 

Rej. 

a = 0.10 

0.03117 

0.02302 

Rej. 

a = 0.20 

0.02233 

0.01849 

Rej. 

a = 0.30 

0.01626 

0.02756 

Acc. 

a = 0.40 

0.01007 

0.01944 

Acc. 

a = 0.50 

0 

0 

Acc. 


Notes: T\ denotes the MDPD estimator with a = 0.5 
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Table 7: Estimation results of Cobb-Douglas production function after removing very low- or high-performing hrms 




Po 

Pi 


7 



QMLE 

7.150(0.123) 

0.365(0.010) 

0.303(0.020) 

0.977(0.129) 

0.155 

0.148 

a 

= 0.05 

7.104(0.119) 

0.369(0.010) 

0.291(0.019) 

0.980(0.122) 

0.148 

0.143 

a 

= 0.10 

7.059(0.115) 

0.372(0.009) 

0.279(0.017) 

0.979(0.117) 

0.142 

0.137 

a 

= 0.20 

6.976(0.112) 

0.378(0.009) 

0.254(0.015) 

0.954(0.113) 

0.133 

0.121 

a 

= 0.30 

6.905(0.112) 

0.382(0.009) 

0.227(0.014) 

0.885(0.122) 

0.127 

0.100 

a 

= 0.40 

6.840(0.118) 

0.384(0.009) 

0.199(0.016) 

0.753(0.152) 

0.127 

0.072 


Notes: the figures in parentheses denote standard errors. 
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Table 8: The MCS test results for selecting the optimal a after removing very low- or high-performing firms 


To 

sim{To,Ti) 

max(stm*(To, Tij) 

Ho 

QMLE 

0.02321 

0.06078 

Acc. 

a = 0.05 

0.02146 

0.02222 

Acc. 

a = 0.10 

0.01949 

0.03080 

Acc. 

a = 0.20 

0.01475 

0.03812 

Acc. 

a = 0.30 

0.00859 

0.02283 

Acc. 

a = 0.40 

0 

0 

Acc. 


Notes: T\ denotes the MDPD estimator with a = 0.4 
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